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One of the crucial steps in scientific studies is to specify depen- 
dent relationships among factors in a system of interest. Given 
little knowledge of a system, can we characterize the underlying 
dependent relationships through observation of its temporal be- 
haviors? In multivariate systems, there are potentially many pos- 
sible dependent structures confusable with each other, and it may 
cause false detection of illusory dependency between unrelated 
factors. The present study proposes a new information-theoretic 
measure with consideration to such potential multivariate rela- 
tionships. The proposed measure, called multivariate transfer 
entropy, is an extension of transfer entropy, a measure of tem- 
poral predictability. In the simulations and empirical studies, we 
demonstrated that the proposed measure characterized the latent 
dependent relationships in unknown dynamical systems more ac- 
curately than its alternative measure. 

Time Series Analysis | Information Theory | System characterization 

One of crucial steps in scientific studies is the characterization of 
a system of interest - specification of dependent relationships 
among factors or subcomponents in the system [1]. The characteri- 
zation is often the early stage of analysis before proceeding to a more 
specific description of the systems, and it requires little or no prior 
knowledge of the underlying mechanism of the system. In the present 
study, we propose a measure which may serve as such an early-stage 
characterization of dependency in a multivariate system with little 
knowledge through observation of its temporal behaviors. 

A basic problem concerned in the present study is how we can 
quantify and detect dependency between each pair of variables in a 
multivariate system. In a multivariate system, a variable X generated 
by a stochastic or a deterministic process is said to be conditionally 
dependent of variable Y given Z if its future state of X is partially 
determined by another variable Y given Z. In particular, we consider 
temporal behaviors of a system in which we measure dependency 
from a set of variables at time t to the set of TV variables at time 
t + 1. In this formulation, the temporal dependency of a system is 
characterized with conditional dependency between Xt and It+i 
given Zt. 

In a linear system, which can be decomposed into separable 
subcomponents without interaction among them, characterization of 
temporal dependency is straightforward. For stationary linear pro- 
cesses, auto- and cross-correlation sufficiently characterizes a set of 
its linear properties. In contrast, a nonlinear system with or without a 
stochastic component, which is not decomposable to subcomponents, 
needs to be characterized with consideration to its interaction among 
subcomponents. A set of information theoretic measures has been 
proposed as a nonlinear counterpart of auto- and cross-correlation 
for a nonlinear system [2]. One of such information-theoretic mea- 
sures, transfer entropy, has found a wide range of applications and 
has successfully characterized many empirical systems [3]. In the 
present study, we propose an extended information-theoretic mea- 
sure, called multivariate transfer entropy (MTE), for characterization 
of multivariate dependency. The proposed measure is a natural gen- 
eralization of transfer entropy, and it concerns potential confounding 
relationships among three or more variables in multivariate tempo- 
ral dynamics. Thus we illustrate the extended measure after a brief 
overview of the related development of the information theory. 

Information Theory. Ever since its establishment by Shannon [4], in- 
formation theory has played a crucial role in mathematical model- 
ing of communication. His original formulation concerns a unidirec- 



tional information transmission through a noisy channel. In the for- 
mulation, a message X generated by an information source is sent to 
a receiver. The receiver receives it as a message Y through a stochas- 
tic channel in which the original message X may be changed to Y 
by a certain chance. This unidirectional information transmission is 
described with the mathematical concept entropy and mutual infor- 
mation of probabilistic distribution of the sent and received messages 
X and Y . Entropy quantifies the amount of stochastic uncertainty 
of an information source by the length of codes encoding the mes- 
sage generated by the information source. Mutual information quan- 
tifies the relative difference between uncertainty in the two ways of 
coding, the random variable Y alone and the variable Y with addi- 
tional knowledge of another variable X. The mutual information is 
maximized when X = Y (noiseless channel), and it is zero at the 
minimum when random variables X and Y are independent. Thus, 
mutual information characterizes the properties of the noisy channel 
between X and Y . Mutual information gives a mathematical ground 
for communication theory concerning the design of an optimal in- 
formation channel given constraints. Introducing the concepts of en- 
tropy, mutual information and its variants, information theory covers 
and connects a wide range of fields and problems such as nonlinear 
dynamical system, thermodyamics, electrical engineering, probabil- 
ity theory, statistics, mathematics, economics, computer science, and 
philosophy of science [5]. 

Despite the mathematical elegance and many successful applica- 
tions, in 1973 Shannon had pointed out the theoretical limitation of 
the unidirectional information transmission, and gave a prospect for 
an extension to information theory with feedback [6]. Indeed in the 
very same year, Marko [7] proposed the extended bidirectional infor- 
mation network as suggested by Shannon. In his bidirectional infor- 
mation network, two information sources send and receive informa- 
tion, and its effficiency of communication is characterized with loss 
of information from the bidirectional information transfer. Although 
both Shannon and Marko highlighted the importance of bidirectional 
information, it has not been well-recognized in the fields until 1990s 
[6]. More recently, Kantz and Schreiber [2, 3] have reintroduced a 
directed measure of statistical dependency, called transfer entropy, 
which is a subset of Marko's bidirectional information theory. One 
of the major advantages of the transfer entropy or bidirectional infor- 
mation transmission over Shannon's unidirectional measure is that it 
enables us to distinguish which factor leads or follows another sep- 
arately from the other direction. After the reintroduction of transfer 
entropy, it has found applications in various research fields. Such 
successful applications include not only engineering fields relevant to 
information theory but also various kinds of scientiffic research: de- 
tecting directed dependency in cellular automata [8], machine learn- 
ing [9], chemical process [10], health monitoring [11], analysis of 
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brain activity [12, 13], stock markets [14, 15], ecological monitoring 
programs [16], music analysis [17], and human-human/robot com- 
munication [18, 19, 20]. 

Limitation of the transfer entropy. Despite its successful applica- 
tions, a potential limitation of transfer entropy has not been well rec- 
ognized. A naive application of transfer entropy to three or more 
variables may cause an inaccurate characterization of a system. In 
the present study, we demonstrated this limitation, and we propose 
multivariate transfer entropy (MTE), a further extension of Marko's 
bidirectional information theory, as a solution. The MTE is con- 
cerned with the potential confounding relationship among three or 
more variables which transfer entropy does not count. Thus it ex- 
tends its usability to more general situations with arbitrary topolog- 
ical structure of dependency among N variables. In this regard, the 
MTE is a nonlinear analogue of partial correlation which cancels lin- 
ear confounding effects of other than correlation between the focal 
paired variables. In order to explicitly distinguish from the MTE, 
hereafter we refer to the original one as pairwise transfer entropy 
(PTE), a special case for a bivariate system. In the following section, 
we give a formal description of mutual information, transfer entropy 
and their relationship. 



Mutual information and transfer entropy 

Consider a unidirectional information transmission through a noisy 
channel in which a message X generated by an information source is 
sent to a receiver. Let p{xi ) be the probabilistic distribution of a mes- 
sage X = Xi £ M = {1,2, ..M}, where M is the set of M alpha- 
bets, and a series of messages is drawn from the probabilistic distribu- 
tion by the information source. Then entropy, — J2i ^oSPi'Xi), 
gives the asymptotic minimum average code length by assigning the 
code with length — logp(a;i) to a infinitely long series of messages 

Then suppose that we assign a code set Q of length — log q{xi), 
instead of the minimum code set P of length — logp{xi), to the 
message Xi with probability p{xi). Its average relative difference be- 
tween the code length of Q and P called relative entropy or Kullback- 
Liebler divergence, D(P||Q) = - J]-p(a;i)(logg(xi)-logp(xi)), 
can be treated as the amount of coding error or the difference 
in stochastic uncertainty of q{X) relative to p{X). In the uni- 
directional information transmission described above, the entropy 
H{Y) quantifies the amoimt of stochastic uncertainty of the re- 
ceived message Y. Likewise, the conditional entropy, H(Y\X) = 
— J p(.f,; . 'ijj ) log p{yj \xi), quantifies that of Y on knowledge of 
X. Then mutual information I{X; Y) = H{H) - H{Y\X) is de- 
fined as difference of the entropy of Y relative to conditional entropy 
H{Y\X) or symmetrically that of the entropy X relative to con- 
ditional entropy H{X\Y). Mutual information can be interpreted 
as the amount of information gain by obtaining shorter code length 
H{Y\X), Y with additional knowledge of X relative to the code 
length of y alone. 

Transfer entropy and bidirectional information networlc. Marko [7] 
has given a reinterpretation of the Shannon's unidirectional transmis- 
sion as a network of information flows, and showed its bidirectional 
extension. Figure la depicts the unidirectional information transmis- 
sion interpreted as a network flows ' . In the network, the mutual in- 
formation I{X'^; K"^), as some amount of entropy of the information 
source X^, is flown into the Y^ . The entropy of received message 
Y^ is the sum of the in-coming flow /(X^; F^) and the uncertainty 
of Y^ alone without X'^, the conditional entropy H{Y'^\X'^). 

This unidirectional information network is a special case of the 
bidirectional network (Figure lb). We follow Marko's terminological 
conventions except for the term entropy rate and (pairwise) transfer 
entropy, which have become more standard after [5, 3]. Unlike the 



one originally proposed by [7], the following formulation needs to 
assume neither stationary nor Markovity of time series in theory ^. 
Suppose we have two series of random variables X = 
and = {Y\y'^ , . . . ,Y'^} over discrete 
time t — 1,2, ... ,T where the top bar X"^ means the set of ran- 
dom variables with superscript specifying time indices from time 1 
to time T. As in the unidirectional communication, we start with 
a measure of uncertainty in a single variable. Entropy rate Hx is 
the sum of conditional entropies of given its past states 
(t = 1,2,...,T-1) [5]. 

T 

H^=J2h{X'\X'-') [1] 
t=l 

where H(X* \0) = H(X*) and X* = 0fort < 1. The entropy rate 
Hx /T is the average increase at each step in the entropy of variable 
X by normalizing with length of time series T. Similarly, the sum of 
uncertainties of a random variable X at time t conditioned on knowl- 
edge of the past states of {X*-\ F*-^} for t = 1, 2, . . . , T is called 
free entropy Fx - Formally, we define as follows. 

T 

= ^H{X*\X*-\Y'-') [2] 

t=i 

Similarly, we write = (F* |X'"\ y'"^). The pairwise 

transfer entropy from y to X at time T is defined as the sum of re- 
ductions in uncertainty of X* conditional on knowledge of the past 
states of two variables {X'"\ y*"^} for t = 1, 2, . . . , T. 

T 

T^^x = H];-F^ = J2HX';Y'''\X'-') [3] 

where /(X; Y\Z) = H{X\Z) - H{X\Y, Z) is conditional mutual 
information between X and Y given Z, and /(X; y|0) = /(X; Y). 
Similarly, Tx^y = Hy — Fy, and Tx^y 7^ Ty^x in general. 
Transfer entropy can be interpreted as directed "information trans- 
mission" from y to X, since i?(X*JX*-i) > H(X'lX*-\ y-^) 
andi?(y*|X*-i) > J7(y*|X*-i,y*-i)if and only if the series of 
variable {X*} is independent of the past states of another variable 
{y*-i}fort = 1,2, ...,r. 

Networic properties of transfer entropy. Marko [7] has pointed out 
that the relationship between entropy rate, free entropy, and trans- 
fer entropy can be viewed as a bidirectional information network 
(Figure lb). The bidirectional network has two variables X and Y 
which send and receive messages between them. Each directed edge 
in the network reflects an information flow with non-negative value 
of corresponding entropy rate (solid line), free entropy (solid line) 
or transfer entropy (broken Une). In each node, the total amount 
of in-coming information flows is identical to the total amount of 
out-going ones (Kirchhoff's current law). The entropy rate is 
the sum of free entropy Fx (new information at T) and transfer en- 
tropy Ty^x (information from another variable the past states up 
to T — 1). A certain part Tx^y of entropy rate is transferred 
to y, and the rest, called residual entropy = Hx — Tx^y, is 
flown out of the network. Similarly, information is transferred from 
the variable Y to X. At each node and edge in the bidirectional net- 
work for the two variable X and Y, two properties, non-negativity 
of information flows and Kirchhoff's current law, are held. We re- 
fer these to two properties as network constraints. In order for all 



^ In the network, the superscript T is a set of time indices, and x'^ — , ..... is 
a group of variables in which we obtain the Shannon's entropy by identifying X — X^ . 
^However, estimation of transfer entropy often requires these properties due to finite sample 
size of dataset in practice. Marko [7] considered the limit of infinite long time series, which 
we can obtain by making T ^ oo in the present formulation of time series of a finite length. 
Specifically, in this paper, we work simply with the quantities ifj , since its limit Is not essential 
in to the arguments presented here. 
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the information flows to be non-negative, it needs to satisfy the fol- 
lowing inequality: Rx = Hx — Tx^y > and symmetrically 
Ry = Hy — Ty^x > 0. In [7], an even better inequality as follows 
has been suggested without a proof. 

mm{H'x,H*Y)>T*x^Y + Ty^x [4] 

We will prove a more general version of this inequaUty for N- 
variable system (A*' > 2) in this study. 

Transfer entropy as decomposition of mutual information. Another 
property of the PTE is as a partial factor decomposing mutual infor- 
mation with certain residuals ^ . 

I{X'^;Y'^) = Tx^y+Ty^x+Rx,y [5] 

where J?x,y = /(xi; j/i|Xi_i, Fi-i) > is non-negative 

due to non-negativity of conditional mutual information. Equa- 
tion 5 explicitly states PTE is an extension of mutual information 
which is a special case without feedback Ty^x + Rx,y = or 
Tx^Y + Rx,Y = 0. 

Multivariate bidirectional information networic 

Here we outline an extension of the paiiwise transfer entropy to more 
general cases with three or more information sources. Technically, 
there are many potentially possible multivariate extensions of PTE. 
However, the proposed extension is justified not just by applicabil- 
ity to multivariate dependency but also by holding the two properties 
as well as PTE. Analogous to PTE decomposing of mutual informa- 
tion, MTE decomposes total correlation, a multivariate extension of 
mutual information [21] (also called multivariate constraint [22] or 
multiinformation [23]). Also we can view MTE as a part of bidi- 
rectional information network among N variables with non-negative 
flow holding Kirchhoff 's current law. In the following sections, we 
formulate a multivariate information network and overview its theo- 
retical properties. See also Supplemental Information for the more 
detailed description and the mathematical proofs of the theorems. 

Formulation of multivariate information network. In a generalized 
information network with N variables, each variable is associated 
with the two nodes - in-coming and out-going node (Figure Id). The 
in-coming and out-going node of variable i respectively receives and 
sends information from all the variables but the variable i. Informa- 
tion flow in between the in-coming and out-going node of variable i 
is the entropy rate of variable and free entropy of variable i. At the 
out-going node, there is some amount of information lost without be- 
ing transferred to the other variables which is called residual entropy. 
A special case of the information network for a three variable system 
is shown in Figure Ic. 

Let Xjf be a set of A?^ x T random variables indexed 
with the index set Af = {1,2,..., N} and the index set 
for time T — {1,2, ...,T}. Then let us denote Xjj- = 
{XT,XT,...,Xj;} = {X]^,Xm,---,XZ} where XT = 
{Xl ,Xf^ . . . , Xj^} is time-cumulative subset of X; for time index 
T, and Xj^ = {Xf, X2, . . . , X^} is a set for the variable set Af 
given t. Given the set of random variables Xjj-, entropy rate , free 
entropy, multivariate transfer entropy and residual entropy are de- 
fined as follows. The cumulative sum of entropy rates of variable i at 
time T is defined as 

T 

hT = Y.h(^x!\xI'^') [6] 
t=i 

where t = {1,2, ... ,t} is the cumulative set of time indices and 
t\t = {1, 2, . . . ,t — 1} means set subtraction of index t from i 
with the set subtract operator "\". This is identical to the entropy rate 



defined in a bivariate system [7, 3]. Free entropy of variable i at time 
T in the A'^-variable system, which is uncertainty of Xi given all the 
past states of N variables X^^ , is defined as follows. 

Fi^^^/f(x||X;;\-) [7] 
t=i 

Multivariate transfer entropy from variable j to i given the set of the 
other variables Af \ {i, j} in the X-variable system Xjj- is defined as 
follows. 

T 

lf_,|Ar\o,,} = E / (X*; X*\*|X^(,) [8] 
t=i 

Residual information from variable i in the A/^- variable system Xjf 
is defined as follows. 

T 

RL^^l{Xl'Xhx'M\{i,j}) [9] 
t=l 

Obviously, each of entropies and informations are non-negative: 
Hi > 0, Fi > 0, Tj^n_^/-\^ijy > 0, and Rij > for arbitrary 
i € Af and j G A/" \ i In a special case with a bivariate system 
(N = 2), it agrees with the bidirectional information network [7] . 

Properties of multivariate information network. The multivariate in- 
formation network has the two major properties -holistic and local- 
stated in the two theorems. The first theorem states that, as a whole, 
a multivariate information network can be viewed as decomposition 
of total correlation among N variables [21, 22, 23]. The second the- 
orem states that, at any node in the network, it holds Kirchhoff 's cur- 
rent law or equivalence of the sum of in-coming information flow to 
the sum of out-going information flows (Figure Id). In addition, each 
of information flows in the network is always non-negative, and this 
allows us to interpret each of informational quantities, entropy rate, 
free entropy, transfer entropy, and residual entropy, as an amount of 
information flow. The non-negativity of the flows is not trivial under 
simultaneous satisfaction of the Theorem 1 and 2 below. The present 
paper proves that MTE defined in Equations (6-9) has the theoreti- 
cal properties stated in the following theorems'* (see also Supporting 
Information for the details). 

Theorem 1 (Decomposition of total correlation). In an A' -variable 
system, total correlation consists of the sum of all the multivariate 
transfer entropies and residual entropies. 

N i-1 

i=i j=i 

where C{Xi^ ) — JUili H{Xj) — H{X^ ) is total correlation, and 
Gij = rj^j|jv-\{i,j} + rj^i|Ar\{i,j} + RT,j is the sum of transfer 
entropies and residual entropies. 

Theorem 2 (Local information flow). Entropy rate of variable i con- 
sists of free entropy of variable i arid the sum of in-coming multivari- 
ate transfer entropies to variable i as follows. 

Hi = Fi+ ^ Tj^ii^\i,jy [11] 

Entropy rate of variable i can be locally decomposed with the 
sum of all the in-coming and out-going multivariate transfer en- 
tropies and residual entropies. 

Hi = H [xT\Xjr\i) + J2 [12] 



^ This relationship between transfer entropy and mutual information has been pointed out origi- 
naily in [7] without the residual term Rx,y- In [6], the transfer entropy Tx^y was defined 
as Tx^Y + Fix.Y in the current notations. 

^However, note that these theorems may not hoid for an empirical MTE estimated with approx- 
imation (e.g., supposed Markov chain and/or stationary of time series) when adatasetvioiates 
the assumed approximations. 
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Numerical and empirical studies 

As numerical and empirical validation of the MTE, we report sim- 
ulation studies with the two classes of nonlinear dynamical systems 
and two case studies of empirical data analyses. Nonlinear dynamical 
systems would be one of interesting testbeds to demonstrate charac- 
terization of directed dependency structure of an unknown system 
with MTE. In a narrow sense, a generating process of a dynamical 
system is deterministic. Despite its deterministicity, its long-term 
chaotic behavior may be unpredictable, and it can be treated as a 
pseudo random series with knowledge of the initial state at finite pre- 
cision. Yet it holds local dependency among variables at each time 
step. In each of the simulations, we generated a suffficiently long 
time series from a nonlinear dynamical system with a specific set 
of parameters. Then we tested whether we can recover the intrinsic 
dependent relationship on the basis of MTE or PTE applied to the 
generated time series. 

In the empirical studies, we applied the MTE to two empirical 
datasets. One is a physiological dataset which has been analyzed in 
multiple previous studies in the context of a nonlinear time series 
analysis. The other is a dataset of human body movements in com- 
plex actions. Common in these two multivariate time series, complex 
systems in general need to coordinate multiple subcomponents in or- 
der to hold some intermediate states neither perfectly static, periodic, 
nor chaotic. Thus it is of great interest to analyze its mutual relation- 
ship among multiple subcomponents. 



Simulation 1 : Lorenz system. In Simulation 1, we applied the MTE 
to the Lorenz attractor which is one of well-known three dimensional 
dynamical systems defined with the following set of ordinary differ- 
ential equations [24]. 



dy 
dt 



a{y - x) 

x{p-z)-y 
xy — j3z 



[13] 



where x, y, and z is the system state, t is time, a, p, and /3 are the 
system parameters. Given the ordinary differential equation, we sup- 
pose that each differential equation reflects information flows from 
the past states (x, y, and z) at t to the next states with a short time lag 
(t + At). Then the question here is whether wc can infer these depen- 
dencies by applying information theoretic measures to the generated 
time series without prior knowledge of the differential equations. In 
the Lorentz equations, the differential ^ and ^ depends on all three 
variables, meanwhile the differential ^ depends on x and y but not 
z. This asymmetric relationship - the variable z depends on x but 
not vice versa - gives a challenge to the measure of dependency in 
the multivariate system. Based on a good measure of dependency, 
we can reject the (conditional) dependence from xto z and detect the 
others. 

Due to the Lorenz system being defined over continuous time, 
its time series was analyzed by manipulating the time lag At of the 
first order Markov chain {t, t + At} systematically from 0.001 to 
0.15. The upper panel and bottom panel in Figure 2 show the av- 
erages of MTEs and PTEs respectively as a function of the lag At. 
We performed statistical inference by taking the zero MTE or PTE 
(conditional independence) as the null hypothesis (See also Method 
for details), and the upper confidential bounds of the theoretical zero 
MTE and PTE are shown as a solid line. 

The results showed that the MTE from variable z to x (high- 
lighted in red), which is to be zero in theory, was evaluated as the low- 
est among the six directed pairs at all the lags except for At < 0.01 
(Figure 2a). At the lags 0.06 < At < 0.07, the to-be-zero MTEs 
from z lo X were around the upper confidential bound of the theoret- 
ical zero MTE. Meanwhile, the MTE between the other five directed 
pairs were significant positive at any lags. The results suggested that 



we could estimate the latent dependent relationship in the Lorenz sys- 
tem on basis of the MTE except for too short lag. 

On the other hand, the results showed that the PTE from variable 
z to X was evaluated as the middle among the six directed pairs of 
them (Figure 2b), and it was significantly larger than the theoretical 
zero PTE at all the lags 0.001 < At < 0.15. One the other hand, 
the PTE from y to x and from x to y, which should be positive in 
theory, tended to be as low as the theoretical zero PTE at all the lags. 
These results suggest that PTE does not just overestimate to-be-zero 
information follow but also underestimate to-be-positive information 
flows. In sum, these simulations suggest that the MTE may measure 
multivariate dependent structure with more accuracy than the PTE. 
The simulation clearly demonstrated the potential limitation of the 
PTE when applied to a system with the three variables or more. 

Simulation 2: Characterization of various dependent structures. 

Simulation 1 suggests the advantage of MTE and potential limita- 
tion of PTE in analysis of multivariate systems. In Simulation 2, 
we analyzed the robustness of MTE-based inference as a function 
of the number of variables, various topological types of dependent 
networks, and effects of unobserved noisy variables. Specifically 
we studied a class of coupled map lattices (CML) which allows us 
to systematically manipulate its parameters. The CML is a class of 
nonlinear dynamical system which linearly combines multiple one- 
dimensional chaotic systems as subcomponents [25]. Although each 
subsystem behavior is relatively simple and well known, it also shows 
a global emergent pattem across subsystems with a particular net- 
work topology among subsystems. Due to these properties, some 
variants of coupled map lattice have been used for modeling various 
kinds of real- world phenomena such as earthquakes [26], form of 
neurons [27], traffic flow [28], open flow [29], convection [30], cell- 
gene interaction [31], epileptic seizures [32] and so on. UtiUzing its 
controllability, we analyzed the robustness of MTE based inferences 
on the system dependency. 

Specifically, we studied a coupled tent map lattice (CTML) 
which is defined as follows. For < x\ < 1 (i = 1,2, ... ,N 
and t = 0, 1, . . . , T), 



= / 



[14] 



where e > is the coupling parameter indicating the degree of de- 
pendency in the system, 5ij is either one or zero indicating existence 
of dependency from j to i, ijl is noise, a random value drawn from 
an uniform distribution, and f{x) is the so-called tent map in which 
f{x) =2xifx< i, otherwise f{x) = 2 — 2x. As in Simulation 1, 
we define a positive coupling parameter between the variable x\ and 
x^^^^ as positive information flow from variable i to variable j. Given 
bmary information flows, either positive or zero, there are 16 differ- 
ent types of dependent networks with directed edges by identifying 
symmetric topology under exchange of the three variables (Figure 
3 a). Each of the 16 diagrams corresponds to a 3 x 3 matrix of net- 
work topology 5ij (i = 1,2,3, j = 1, 2, 3) in Figure 3b. The colored 
{i, j)-cells in the matrices indicate the coupling parameters from a* 
to x*^^: white = 1, gray = e, and black = in Equation (14). Given 
the CTML, we systematically explored all the possible dependency 
diagrams with three, four, and five variables. There are 2^', 2^^, and 
2"^" possible combinations of inferences on binary information flows 
for each dataset, 96, 2616, and 192160 directed pairs in 16, 218, and 
9608 unique diagrams for three, four and five variables respectively 
(Table 1). 

Given the latent positive or zero information flows of the CTML, 
we generated the time series and computed MTE (PTE) for each 
directed pair. Then we defined MTEs (PTEs) larger than its 99%- 
confident upper bound of the theoretical zero MTEs (PTEs) as sig- 
nificant positive information flows. We analyzed the correspondence 
between the estimated and latent information flows as correct infer- 
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ence. Figures 3c and 3d show the results of estimated dependent 
pairs in the CTML with the three variables and the coupling param- 
eter e = 0.2 without noise (rjl = for any t). The estimated sig- 
nificantly positive MTEs or PTEs are shown as gray, otherwise black 
(See also Method for the statistical inference). The results showed 
that the MTE successfully gave correct inferences of dependency for 
all the 96 directed pairs in the 16 diagrams (Figure 3d). Meanwhile, 
the PTE overestimated the six directed pairs in four diagrams, and 
caused incorrect inferences which are highlighted in red in Figure 3c. 

All the results of Simulation 2 including the CTML four and five 
variables are surrmiarized in Table 1. The case-based correctness was 
defined as correct inferences for all the directed pairs in each combi- 
nation of the diagrams. For the four-variable CTML, the proportion 
of correct inference based on the MTE was 90.78% of the 218 cases 
and 97.78% of the 2616 directed pairs. Meanwhile, that based on the 
PTE was 9.22% of the cases and 79.67% of the directed pairs. For 
the five- variable CTML, the proportion of correct inference based on 
the MTE was 81.48% of the 9602 cases and 95.41% of the 192160 
directed pairs. That based on the PTE was 1.24% of the cases and 
71.23% of the directed pairs. In sum, the simulation with the CTMLs 
demonstrated that robustness of MTE across the varying number of 
variables. On the other hand, inference based on the PTE tended to 
be more inaccurate as a function of the number of variables. 

Robustness to unobserved variables 

In the next analysis, we tested the robustness of the MTEs when ap- 
plied to a dataset with vmobserved variables. One of lessons derived 
from Simulation 1 is that PTE for a system with three or more vari- 
ables may be inaccurate. The same lesson could apply to -variable 
MTE for ones with (A'^ + 1) or more potential variables. Since, un- 
like the simulations, we cannot always observe all the sufficient set 
of variables in typical empirical data analyses, it raises a potential 
concern that MTE may not be better than PTE for the dataset with 
potential imobserved set of variables. Therefore, it is of practical 
importance to evaluate robustness of the measures for such datasets 
with unobserved variables. In the simulation, we generated the time 
series following the equation X with the noise term rjj is a random 
value drawn from the uniform distribution [0, 0.1] for each time step 
t. The random inputs 77* to the variable x*^ reflects purtabation by 
the unobserved set of variables. With or without noisy unobserved 
variables, we analyzed the average classification performance of the 
MTE and PTE for the 16 cases of 3-variable CTMLs (Figure 3a) as a 
function of the coupling rate e from to 0.6 (Figure 3a and 3b). 

In analysis of the dataset without unobserved variables, on the 
basis of TEs, we made correct inferences in all the cases at a small 
range of coupling parameters 0.05 < f < 0.1 (Figure 4a). Likewise, 
on basis of MTEs, we made correct inference in all the cases at a rela- 
tively broader range of coupling parameters 0.1 < e < 0.25 (Figure 
4b). Since too large coupling parameters (e > 0.25) made multi- 
ple variables perfectly coupled (R > 0.95), those coupling variables 
were difficult to discriminate on the basis of MTEs with its coarse- 
grain encoding of the phase space. In fact, we found at least one false 
detection due to nearly perfect coupling in Cases 10, 11, 13, 14 and 
15 (Figure 3a). In contrast, the advantage of PTEs in a small coupling 
parameter is likely to be caused by the relatively small effects of the 
third variable. In addition, MTE needs to estimate the probabilistic 
distribution a large combinatorial space relative to the given sample 
size. This sparseness of samples leads MTE to be more conservative 
to reject false positive information flows. Except for this small cou- 
pling rate advantage, MTE outperformed PTE in most of the cases 
and parameters. 

Simulation of the datasets with noisy latent variable showed ba- 
sically similar patterns as found in that of the noiseless dataset. In 

Figure 4b, we found the advantage of PTE in a small coupling param- 
eter e = 0.1 or smaller and the better detection performance of MTE 
otherwise. We also found a remarkable difference from the noise- 



less dataset: MTE tended to show even better performance while TE 
tended to show worse performance in the dataset with noisy latent 
variable. In this particular simulation, the reason for the even bet- 
ter performance of MTE for the noisy data was perhaps because the 
noisy variables decoupled the perfect-coupled variables (at e > 0.25) 
which harmed MTE performance for the noiseless dataset. In sum, 
these results suggested the robust relative advantage of MTE even 
with noisy latent variables. 

Summary of numerical studies 

We stunmarize the findings in Simulations 1 and 2 in the four points. 
First, MTE showed advantages over PTE in both continuous-time 
system asymmetric under exchange of variables (Simulation 1) and 

discrete-time systems (Simulation 2). Second MTE also showed ad- 
vantages in various types of dependent topology in the CTMLs with 
3, 4, and 5 variables. Third, its advantage is robust even in the analy- 
sis of the datasets with unobserved noisy variables. Finally, the sim- 
ulations also showed a limitation of the MTE, more conservative es- 
timate of information flows than PTE. We will discuss this technical 
limitation in the later section. 

Empirical data analysis 1 : Physiological data. As a case study, we 
analyzed a physiological dataset including three vital signs recorded 
in a sleeping person[33, 34]. Besides being a trivariate time series, 
we chose this dataset as a benchmark test, since it has been analyzed 
across many theoretical studies [2, 3, 35, 36]. The original data con- 
sists of the set of three time series of heart rates, breath rates, and 
blood oxygen concentration recorded at 0.5 Hz of sampling rate. The 
particular person measured has been known to show respiratory si- 
nus arrhythmia. It is a frequently-seen symptom that shows correla- 
tion between heart rates and breath rates. As expected, the previous 
study showed that the heart rates and breath rates transferred infor- 
mation bidirectionally by applying PTE [3]. However, as suggested 
in Simulation 1 and 2, it is potentially possible to have such seem- 
ing information transfers caused by the third factor, for example, the 
blood oxygen concentration in this dataset. Thus, we performed re- 
analysis on the dataset not just as bivariate but as a part of a trivariate 
system by applying the MTE. 

Figures 5a and 5b respectively show the PTE (as bivariate series) 
and MTE (given the blood oxygen concentration) between heart rates 
and breath rates as a function of time lag. In Figure 5b, we replicated 
the qualitative patterns of PTEs as found in the previous study: both 
directions have information transfers at most of time lags, while the 
heart rate tended to transfer information to the breath rates more than 
the other direction ^. 

The qualitative patterns of PTE and MTE basically agreed - heart 
rates and breath rates are tightly coupled bidirectionally with or with- 
out respect to blood oxygen concentration. This result confirmed the 
conclusion in the previous study even as a part of a trivariate sys- 
tem in regard to these qualitative patterns. However, we also found a 
difference between the two measures. In MTE, we also found that in- 
formation transfer between the heart rates and the breath rates peaked 
around the same time scale of the lag approximately 2 sec. One the 
other hand, in PTE, the two directions had peaks at quite different 
scales of time lags: PTE from Heart to Breath peaked at approxi- 
mately 2 sec and that from Breath to Heart was at approximately 20 
sec. At this moment, we could not conclude which of the results, 
synchronized or delayed peaks in MTE and PTE, is more plausible in 
light of empirical findings. It is an open question for further empirical 
studies. 

Empirical data analysis 2: Motor coordination in complex actions. 

The second case study is an analysis of complex human actions. Our 



^It is potentially possible to have the results disagreed ih the present and previous study due 
to its technical difference in estimation method and choice of a particular subset of time series. 
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bodily actions require coordinated movements of multiple body parts. 
A human body consists of over two hundreds bones, numerous mus- 
cles, and billions of neurons in central and peripheral nerve systems 
controlling them with feedback loops. Obviously, making a smooth 
action requires integrated control over all levels of these systems. It 
is of our interest to characterize human motor coordination in skillful 
actions through the MTE. Specifically, we chose a dataset of complex 
actions performed by multiple players with different expertise levels. 
The data was originally obtained in order to analyze the levels of ex- 
pertise in the samba music plays [37, 38]. The original dataset con- 
sists of five players, and each player performed basic samba shaking 
actions in five different tempos (60, 75, 90, 105, 120 beats per minute, 
and each trial lasted 97.4 seconds on average) by being cued with a 
metronome. While playing, three dimensional motions of 18 mark- 
ers, attached on body parts and musical instrvmients, were recorded 
at 86.1Hz of sampling rate. 

As well as the original study, here we aim to find the relation- 
ship between informational properties among bodily actions and the 
expertise levels in the motor skill. The present study analyzed the 
actions of three players, chosen from the original five, who are one 
master player (more than thirty years of experience) and two of his 
disciples Disciple 1 (six years of experience) and Disciple 2 (two 
years of experience). The expertise levels between the master and 
his disciples were expected to be different. Given our knowledge of 
the players' expertise levels as the ground truth, we tested whether 
MTE can successfully detect the differences in their skill levels. For 
simplicity, we limited ourselves to analyze a subset of the original 
datasets, 3 190 samples (74. 1 seconds long) of four motions of mark- 
ers attached on right wrist, right elbow, and two sides of the musicsd 
instrument (shaker). These were the essential parts of the samba ac- 
tions making soimds directly, and we expected that information flows 
among them would be crucial to characterize the players' expertise 
levels. In a smooth samba play, multiple body parts need to be co- 
ordinated to perform the complex actions. Thus, perhaps common 
in general multivariate time series analyses, one of challenges in this 
analysis is to decompose the smooth actions into information flows 
between body parts. 

In the analysis, we applied the MTE and PTE to all the directed 
pairs of four motions. Figure 6 shows the proportion of directed pairs 
with significantly positive MTE and PTE averaged across five dif- 
ferent tempo conditions (60 = 12 x 5 directed pairs in total for 
each subject). The results showed distinguishable patterns of infor- 
mational coupling among the master and the two disciples. Across 
all the five tempo conditions, we found all the body parts in the mas- 
ter player nearly perfectly coupled. Meanwhile, Disciple 2 with the 
least experience among the three showed the least number of infor- 
mational coupled pairs. In Figure 6, each graph on the top shows the 
information network of each player. It has a solid edge between the 
markers if at least one of the two directions had significant positive 
MTE across all the five tempo conditions. The graph of Disciple 2 
shows that the only consistently coupling pair was his wrist and a side 
of shaker. That of Disciple 1 showed the three edges, elbow-wrist, 
wrist-shaker2, and shaker2-shakerl, which suggests these physically 
connected parts formed a action like whip stroking. As expected, 
these results showed consistency between the player's expertise lev- 
els and the MTE-based informational properties in their bodily ac- 
tions. 

As a comparison, we also applied PTE to the same dataset (Fig- 
ure 6). The results showed that PTE did not detect the differences 
between the master and Disciple 1 both of whom showed significant 
PTEs in all the directed pairs. Compared with the MTE, PTE tended 

to overestimate the coupling pairs in all three players. In the graph 
patterns, PTE detected positive information between the pairs which 
MTE did not detect. Regarding our knowledge of subjects' expertise 
levels, PTE estimation was likely to detect false positive information 
flows due to the effects of the two other unconsidered variables. As a 



result, we could not find the difference in the PTEs among players as 
clearly as found in the MTEs. These results of empirical data anal- 
yses suggested potential appUcability of MTE to empirical complex 
multivariate time series. 



Discussion 

The present study proposes an extended information theoretic mea- 
sure for system characterization through time series. The multivari- 
ate transfer entropy is a natural generalization of pairwise transfer 
entropy to a multivariate system holding a set of theoretical proper- 
ties. The MTE was tested on the two classes of nonlinear dynamical 
systems and on the two empirical datasets. The simulations demon- 
strated the advantage of MTE over PTE, in both discrete-time and 
continuous-time systems, with most of the topologies of dependency 
among 3, 4, and 5 variables, and even with additional noisy latent 
variable. These advantages of MTE would stem from the theoretical 
property that the MTE decomposes higher order dependencies into 
information flows in a network. Since the PTE does not always sat- 
isfy it for a system with three or more variables, the PTE from A to 
B may take some value independently of the PTE from A to C. As 
a result, the PTE from A to B may overestimate or underestimate 
dependency between A and B when the third variable C also has 
effects on B. 

Application to the two empirical datasets suggested its potential 
use in the analysis of empirical complex systems including physio- 
logical signals and hvunan motor coordination. In analysis of such 
datasets with complex interactions, MTE is likely to be useful be- 
cause it exclusively measures a pair of variables by cancelling out the 
effects from the other variables. This general applicability of MTE 
to multivariate systems covers a broader range of empirical and theo- 
retical fields using PTEs [8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20]. 

Technical limitations and future works. In contrast to its relatively 
robust and accurate evaluations, the present simulation studies also 
suggested a limitation of the MTE. The analysis in Simulation 2 
showed that the MTE was more conservative to detect dependency 
than the PTE. This problem of MTE was likely to be caused by the 
technical issue in its estimation. The A^-variable MTE needs condi- 
tional entropy of A'^ variables in which the combinatorial space may 
grow as an exponential function of the number of variables. It causes 
high computational costs and inaccuracy of the estimation due to the 
sparsity of samples relative to the exponentially growing space. In the 
current implementation, which was not optimized but was computed 
in a naive way, it was very costly to compute even relatively modest 
number of variables N > 5. This estimation problem prevented us 
from using a finer-grained binning on the phase space, and wo suggest 
that it resulted in the conservative detection of information by MTE. 
Therefore, futher work could include developing a technique relaxing 
this problem. Similar technical issues have been discussed for PTE 
such as small-sample correlation for pairwise transfer entropy [15] 
or non-parametric probabilistic density estimator for continuous time 
series [39]. 

Another related concern in empirical analyses is parameter spec- 
ification. In order to accurately measure dependency, we need to 
specify temporal delay and estimators of probabilistic distributions. 
The current simulations were demonstrated with one of the simplest 
probabilistic models - binary coding by median splitting with the first 
order Makov chain. It is an open question to what extent estimation 
of the MTE depends on the choice of these parameters which per- 
haps depends on the case. More importantly, how can we choose 
the probabilistic model and delays? One of potential solutions for 
this problem for dynamical systems is generating partitions. A set 
of generating partitions gives a theoretical ground for "best" discrete 
states of discrete or continuous dynamical system which has one-to- 
one correspondence between a series of symbols and a subset of state 
space. It has been constructed for several low dimensional chaotic 
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systems (e.g., coupled map lattice [40] and Henon map [41]) and 
several algorithms estimating symbolic dynamics from an empirical 
time series have been proposed [42, 43, 44]. 



Methods 

In all the analyses in the present study, a set of continuous time se- 
ries of T samples of N variables was converted to a symbolic form 
encoding the original data in a coarse grained representation (See 
each analysis below for the specific symbolization process). Based 
on the symbolic series, probabilistic distribution of time series as- 
suming K-th order Markov chain was estimated, then MTE and PTE 
were estimated on the probabilistic distribution subsequently (See 
Estimation of probabilistic distribution below for details). The es- 
timated MTE (PTE) was compared to the corresponding theoretical 
zero MTE (PTE) (See Zero transfer entropy below). All the compu- 
tational routines used in the simulations and empirical data analyses 
were written on the MATLAB plat form, which is available at the 
author's website (http://www.jaist.ac.jp/~shhidaka/). 

Estimation of probabilistic distribution. In each simulation above, 
an array of the values Xi,t for z = 1,2, . . . , N , t — 1,2, ... ,L 
was given as the iV-dimensional time series of length L. Each 
value in dimension i was symbolized xu — f{xit) where fhiix) : 
Mi is the symbolization function mapping the one di- 
mensional real space to the symbol set of Mi alphabets. Mi = 
{1,2, Mi}, which is specified in each simulation. The AT- tuple, 
yt = {£i,t, xi,t, . . ■ , XN,t} € M'^, was treated as the joint space of 
N symbols. Assuming stationarity and K-th order Markovity, the 
conditional probabilistic distribution P(j/'|jy*~^, y*~^, . ■ . , J/'^^) 
over {t — K,t — K + 1, . . . ,t} was estimated by its maximum 
likelihood estimator (MLE) assuming the multinomial distribution 
over the joint symbol space {Mi ® M2 (8) ... (8) Mf}. Specifi- 
cally, the MLE is the frequency over the joint symbol space {M^ (g) 
M2 ® ■ ■ . ® A^f" } which is normalized to be probability. According 
to the stationary assumption, P(y^^\y^^^^ ,y^^^^ , . . . ,y^^^'^ ) = 
P(yt2|j^t2-i j^*2-2^ ^^t2-X) ^Qj. ^jjy < L - if and t2 < 

L — K. Thus, a series within the time window of K + 1, 
zl'^^ = {yt,yt+i, . • . , yt+K}, was counted across the data length 
t = 1,2, . . . , L — K , thus, a dataset of length L provides L — K 
samples for estimation of the conditional probabilistic distribution 
P{y*\y^~^ ,y*~'^ , ■ ■ ■ Since the joint symbol space has a 

large number of possible combinations Yli growing as expo- 

nential function of dimension and time window length K, an em- 
pirical dataset with a limite sample size may be too sparse to estimate 
probabilistic distribution over the joint symbolic space. Therefore, 
another reasonable choice for a sparse-data estimator would be the 
ones combined with various kinds of smoothing techniques on the 
if-gram model. The author confirmed that the modified Kneser-Ney 
smoothing [45] was effective in particular for data with small sample 
size, although the present paper reported the MLE estimator in all the 
analyses. 

Zero transfer entropy. The estimate transfer entropy was compared 
with the null hypothesis that the true (pairwise or multivariate) trans- 
fer entropy is zero meaning conditional independence of a given pair 
of variables. Note that, even under such the null hypothesis, esti- 
mated transfer entropy may be positive due to a finite sample size for 
estimation. The probabilistic distribution of the null transfer entropy 
(as a special case of conditional mutual information) follows gamma 
distribution with the shape parameter s\ and the scale parameter S2 
[46]. The parameters si and s2 may vary across simulations due to 
specific features of samples, but its maximum is X and Y in Simu- 
lation 1 and Z and W in Simulation 2, XX and XX in the empirical 
data analysis. 



Simulation l.ln Simulation 1, we generated a time series from the 
Lorenz system from to = 50 to t = 2050 with the initial value 
{x{to),y{to),z{to)} = {l-|-7?i,l/2-|-??2,0-|-r?3}andtheparameters 
{a, p, /3} = {10, 28, 8/3} (Equation 13) where each of the noise fac- 
tors {771 , r\2 , r]z } is a random value drawn from uniform distribution 
from to 0.01. Using the solver of the ordinary differential equation 
(ode45 routine in the MATLAB), we obtain approximately 118,000 
samples of the three dimensional series, and resample the time se- 
ries by linear interpolation as desired temporal resolution from 0.001 
to 0.15 per sample. For each of given temporal resolutions A/, we 
obtained 300,000 samples by taking a subset of c sets of the time 
series with different initial values where c = [ ^"2000^* ]) [^] 
the maximum integer equal or smaller than x. Given a set of 3 di- 
mensional time series, in order to form probabilistic distribution, we 
convert each variable to binary series s\ of dimension i and time t 
by splitting median point of each dimension. The first order Markov 
chain p(si, s'+^') (for i = 1,2, andS) was used for MTE and TE 
estimation. 

Simulation 2. We generate time series xj of the coupled tent map lat- 
tice based on the Equation 14 with a given set of parameters (coupling 
parameter e and Sij for i = 1,2, ... ,N,j = 1,2, . . . , N in Equa- 
tion 14 of N variables), a set of random values rj* drawn from (0, r) 
r is either or 0.1, and a set of random initial values x\ drawn from 
uniform distribution of the range (0, 1). In each case, a time series 
of 10 ' steps after the first 1000 samples discarded as transient was 
converted to binary values s* by median splitting si = h{xl > x\) 
where x\ is the median ol x\ {i — 1, 2, . . . , 10'') ) and the Heavi- 
side function h{x) is 1 if a; is positive otherwise. The third order 
Markov chain p{si, s*+^, s*"*"^, s*+'') (for i = 1, 2, 3) was used for 
MTE and TE estimation. 

Empirical data analysis 1 . We analyzed the dataset B of the trivariate 
time series, heart rates, breath rates, and blood oxygen concentration, 
retrieved from the Santa Fe time series competition (http://www- 
psych.stanford.edu/~andreas/Time-Series/SantaFe.html). We con- 
catenated all the consecutive time series longer than 250 seconds and 
in the waking states diagnosed by the expert, and made a trivariate 
time series of 1 1560 samples. For each of given temporal resolutions 
At, we obtained 20,000 samples by taking subset of C sets of the 
time series with different lags {to+t,to+t + lS.t, . . . ,to-\-t-\- fcAt} 
where t = Q,^,..., c = [222^]) and [a;] is the maxi- 

mum integer equal or smaller than x. 

Empirical data analysis 2. The dataset consists of three players per- 
forming in five different tempos (60, 75, 90, 105, 120 beats per 
minute, and each trial lasted 97.4 seconds on average). The move- 
ments of four markers attached on right elbow, right wrist, and the 
two sides of musical instruments, each originally recorded at 86.1 
Hz, were analyzed. In the analysis, after down-sampling the origi- 
nal data to 46.05 Hz, the first 250 samples (5.81 second long from 
the beginning of the recording) were excluded as initial setup of 
the actions, and 3250 samples (75.5 second long) of movements 
were analyzed for each subject. In order to reduce measurement 
noise, for each movement of the markers, the local linear projective 
method was performed after phase space reconstruction of each time 
series on the 31 dimensional time delay space with 46 msec (i.e., 
{t, t + At,t + 2At, ...,t + 30At} where At = 46 msec) [2]. For 
each estimated phase space, a symbol series was assigned using the 
symbolic false nearest neighbor method [43] which estimates a gen- 
erating partition for a time series. Given the four-variable symbolic 
series, we applied the multivariate transfer entropy. The estimated 
MTE greater than the zero MTE at the level of p < 0.001 were de- 
fined as a significant MTE for each of the four body parts in each 
condition. 
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Fig. 1. (a) Unidirectional and (b) bidirectional information transmission between two vari- 
ables X and Y. (c) The extended bidirectional information transmission among three vari- 
ables, and (d) local in-comIng and out-going information flows In a part of a general 
N-variable bidirectional information network. 



Table 1 . Summary of the inference performance of information flows in 
the coupled map lattice with 3, 4 and 5 variables and coupling parameter 

e = 0.2. 



Simulation settings 


Multivariate TE 


Parlwise TE 


N 


Comb. 


Cases 


Pairs 


Cases 


Pairs 


Cases 


Pairs 


3 




16 


96 


100% 


100% 


75.0% 


93.75% 


4 




218 


2616 


90.78% 


97.78% 


9.22% 


70.67% 


5 




9608 


192160 


81.48% 


95.41% 


1 .24% 


71 .23% 
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(a) MultivariBte Transfer Entropy 




(3) Pairwise Transfer Entropy 




Fig. 2. The estimated PTE and MTE tor Lorenz attractor as a function of time iag At. The 
directed pair from 2 to a; to be zero in theory is highlighted in red. 
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(a) Information flow diagrams 
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(c) Pairwise transfer entropy 
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(b) Coupling parameters 
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(d) Multivariate transfer entropy 
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Fig. 3. (a) 16 possible diagrams of information flows (its direction indicated by ttie arrows) 
among the ttiree variable 1 , 2, and 3. (b) The matrices of coupling parameters in the 
coupled tent map lattice corresponding to the 16 information flow diagrams (white diagonal 
cell = 1 , gray = t, and black = 0), (c-d), The matrices of estimated information flows in which 
the significant positive PTE and MTE are in gray, the ones not significant are in black, the 
ones not tested (diagonal cells) are in white. The red outlines of matrices highlight cases 
with misdetection of information flows. 
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(a) CTML without noise (N=3) 
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(b) CTML with noise (N=3) 
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Fig. 4. The correct detection of information flows (directed pairs as its unit) as a function of 
the coupling parameter averaged across the 1 6 cases of coupled tent map lattice with three 
variables including (a) no noise factor (?7* = 0) and (b) noise factors (0 < ??* < 0.1). 
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Lag (sec) Lag (sec) 

Fig. 5. The estimated information transfer between heart rates and breath rates with IVITE 
given blood oxygen concentration (left) and PTE (right). 




Fig. 6. the proportion of coupling bodily movements estimated with MTE and PTE. The 
proportions for each subject are average across 5 different tempo conditions, and its 
maximum and minimum are shown as the errorbar. Each graph on top of the bar shows 
informational connectivity among four bodily movements in which an edge shows one of 
either or both directions have significant MTEs or PTEs across all the five conditions. 
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